All Questions
Tagged with machine-learningnatural-language-processing
147 questions
0votes
3answers
70views
Using LLM/AI tools to identify entity types
I am working with a data that has a list of organization names, but the "type" of the organization is not given. What I mean by type is that I know that organizations within my list can fall ...
0votes
0answers
21views
Finding Contextual Synonyms that are not necessarily Grammatical Synonyms
I'm trying to learn if there is a way to utilize ML to find out a list of contextual synonyms for a word in a sentence. I know of some obvious ones where you mask the word and have some model predict ...
-1votes
1answer
64views
Where is machine learning leading to? [closed]
I was looking at the progress of the more popular LLMs in the last few years and wondering whether in the near future what will happen is that through the use of semi-exhaustive methods only those ...
2votes
2answers
211views
Is Llama3 fully open-source, including tokenizer, transformers, and other components needed to build a custom LLM?
I'm trying to understand whether Llama 3 (or other open source models) is fully open-source. Specifically, I would like to know: Is the source code for Llama 3 (including the tokenizer, transformers, ...
5votes
2answers
297views
Are the model implementations in Hugging Face’s transformers library created by the original model authors or by Hugging Face?
I've been exploring the implementation of models like Llama in Hugging Face’s transformers library, for example: Hugging Face's Llama model implementation. I’m ...
4votes
1answer
168views
In the Manifold Hypothesis applied to LLMs, are text sequences points or paths on the manifold?
The Manifold Hypothesis makes a ton of sense to me for images. Images are points in high dimensional space, where each dimension corresponds to the intensity value of a single pixel. For example, we ...
3votes
1answer
1kviews
why we use learnable positional encoding instead of Sinusoidal positional encoding
In the original paper of transformers they using positional encoding to capture the position of each word in the sentence and for calculate that it using sin and cos ,like shom in the image. In Bert ...
1vote
2answers
889views
How the Q,K,V be calculated in multi-head attention
I want to understand the transformer architecture, so I start with self attention and I understand their mechanism, but when I pass to the multi-head attention I find some difficulties like how ...
4votes
2answers
1kviews
Why different noise in GAN generate different images?
I understand that noise $z$ serves as the input to the generator. Noise $z$ is essentially a vector of random numbers, typically from Gaussian distribution with chosen size of like $100$. However, I ...
1vote
1answer
140views
Fine tuning or just feature extraction or both using Roberta?
I'm reading a program that use the pre-trained Roberta model (roberta-base). The code first extracts word embeddings from each caption in the batch, using the last hidden state of the Roberta model. ...
2votes
2answers
1kviews
What technique is used for training Large Language Models like GPT?
I'm learning about GenAI, such as GPT (Generative Pretrained Transformer), and I'm particularly interested in understanding the training techniques used for these models. Deep learning, generally, can ...
0votes
0answers
60views
Understanding the concepts of embedding in Roberta architecture?
I'm reading the implementation file of Roberta architecture, specifically in the RobertaEmbedding class, this class has the comment: ...
0votes
1answer
222views
how can I interpret attention weights matrix? Are they reliable?
I've fine-tuned two different models (Bert and Roberta) on a dataset for a binary classification task and I'm comparing the sentences where the models predict wrong. I decided to use attention weights ...
0votes
1answer
134views
Using naive bayesian vs. transformer-based architecture model for human-annotated data?
I have a reddit dataset with thousands of online posts over the economy and inflation. We have used human-annotation on 60% of posts to determine whether users blame the following entities over the ...
2votes
1answer
136views
NLP "small" model to improve "big" model
When training the model for NLP is it important to get rid of data which has "bad semantic" for learning process? My plan is to create a "small model" which can decide whether data ...